Goto

Collaborating Authors

 Aswan Governorate


Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing Systems

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.


The Longest Solar Eclipse for 100 Years Is Coming. Don't Miss It

WIRED

The Longest Solar Eclipse for 100 Years Is Coming. NASA has announced when the longest total solar eclipse of the century will occur--and you won't have to wait long. Here's what you should know. The duration of a total solar eclipse always varies. In April 2024, the eclipse that crossed North America lasted 4 minutes and 28 seconds.


Multi-Objective Reinforcement Learning for Water Management

Osika, Zuzanna, Rădulescu, Roxana, Salazar, Jazmin Zatarain, Oliehoek, Frans, Murukannaiah, Pradeep K.

arXiv.org Artificial Intelligence

Many real-world problems (e.g., resource management, autonomous driving, drug discovery) require optimizing multiple, conflicting objectives. Multi-objective reinforcement learning (MORL) extends classic reinforcement learning to handle multiple objectives simultaneously, yielding a set of policies that capture various trade-offs. However, the MORL field lacks complex, realistic environments and benchmarks. We introduce a water resource (Nile river basin) management case study and model it as a MORL environment. We then benchmark existing MORL algorithms on this task. Our results show that specialized water management methods outperform state-of-the-art MORL approaches, underscoring the scalability challenges MORL algorithms face in real-world scenarios.




A Survey of Pun Generation: Datasets, Evaluations and Methodologies

Su, Yuchen, Zhu, Yonghua, Wang, Ruofan, Huang, Zijian, Benavides-Prado, Diana, Witbrock, Michael

arXiv.org Artificial Intelligence

Pun generation seeks to creatively modify linguistic elements in text to produce humour or evoke double meanings. It also aims to preserve coherence and contextual appropriateness, making it useful in creative writing and entertainment across various media and contexts. Although pun generation has received considerable attention in computational linguistics, there is currently no dedicated survey that systematically reviews this specific area. To bridge this gap, this paper provides a comprehensive review of pun generation datasets and methods across different stages, including conventional approaches, deep learning techniques, and pre-trained language models. Additionally, we summarise both automated and human evaluation metrics used to assess the quality of pun generation. Finally, we discuss the research challenges and propose promising directions for future work.


Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It

Li, Shuyue Stella, Bose, Avinandan, Brahman, Faeze, Du, Simon Shaolei, Koh, Pang Wei, Fazel, Maryam, Tsvetkov, Yulia

arXiv.org Artificial Intelligence

Current large language model (LLM) development treats task-solving and preference alignment as separate challenges, optimizing first for objective correctness, then for alignment to aggregated human preferences. This paradigm fails in human-facing applications where solving a problem correctly is insufficient if the response mismatches the user's needs. This challenge intensifies in just-in-time scenarios where no prior user interaction history exists due to cold-start conditions or privacy constraints. LLMs need to identify what they don't know about user preferences, strategically elicit preference values through questioning, then adapt their reasoning processes and responses accordingly -- a complicated chain of cognitive processes which we term personalized reasoning. We introduce PREFDISCO, an evaluation methodology that transforms static benchmarks into interactive personalization tasks using psychologically-grounded personas with sparse preferences. Our framework creates scenarios where identical questions require different reasoning chains depending on user context, as optimal explanation approaches vary by individual expertise and preferences while maintaining factual accuracy. Evaluation of 21 frontier models across 10 tasks reveals 29.0% of naive personalization attempts produce worse preference alignment than generic responses, yet generic responses also fail to serve individual user needs effectively. These findings suggest personalized reasoning requires dedicated development rather than emerging naturally. PREFDISCO establishes personalized reasoning as a measurable research frontier and reveals fundamental limitations in current LLMs' interactive capabilities, providing a foundation for developing systems that can adapt to individual users in education, healthcare, and technical domains where personalization is critical.


Real-Time Fish Detection in Indonesian Marine Ecosystems Using Lightweight YOLOv10-nano Architecture

Wuntu, Jonathan, Putro, Muhamad Dwisnanto, Syahputra, Rendy

arXiv.org Artificial Intelligence

Indonesia's marine ecosystems, part of the globally recognized Coral Triangle, are among the richest in biodiversity, requiring efficient monitoring tools to support conservation. Traditional fish detection methods are time-consuming and demand expert knowledge, prompting the need for automated solutions. This study explores the implementation of YOLOv10-nano, a state-of-the-art deep learning model, for real-time marine fish detection in Indonesian waters, using test data from Bunaken National Marine Park. YOLOv10's architecture, featuring improvements like the CSPNet backbone, PAN for feature fusion, and Pyramid Spatial Attention Block, enables efficient and accurate object detection even in complex environments. The model was evaluated on the DeepFish and OpenImages V7-Fish datasets. Results show that YOLOv10-nano achieves a high detection accuracy with mAP50 of 0.966 and mAP50:95 of 0.606 while maintaining low computational demand (2.7M parameters, 8.4 GFLOPs). It also delivered an average inference speed of 29.29 FPS on the CPU, making it suitable for real-time deployment. Although OpenImages V7-Fish alone provided lower accuracy, it complemented DeepFish in enhancing model robustness. Overall, this study demonstrates YOLOv10-nano's potential for efficient, scalable marine fish monitoring and conservation applications in data-limited environments.


Predicting Traffic Accident Severity with Deep Neural Networks

Bibb, Meghan, Rivas, Pablo, Tayba, Mahee

arXiv.org Artificial Intelligence

Traffic accidents can be studied to mitigate the risk of further events. Recent advances in machine learning have provided an alternative way to study data associated with traffic accidents. New models achieve good generalization and high predictive power over imbalanced data. In this research, we study neural network-based models on data related to traffic accidents. We begin analyzing relative feature colinearity and unsupervised dimensionality reduction through autoencoders, followed by a dense network. The features are related to traffic accident data and the target is to classify accident severity. Our experiments show cross-validated results of up to 92% accuracy when classifying accident severity using the proposed deep neural network.


When Alignment Hurts: Decoupling Representational Spaces in Multilingual Models

Elshabrawy, Ahmed, Kaing, Hour, Song, Haiyue, Aji, Alham Fikri, Tanaka, Hideki, Utiyama, Masao, Dabre, Raj

arXiv.org Artificial Intelligence

Alignment with high-resource standard languages is often assumed to aid the modeling of related low-resource varieties. We challenge this assumption by demonstrating that excessive representational entanglement with a dominant variety, such as Modern Standard Arabic (MSA) in relation to Arabic dialects, can actively hinder generative modeling. We present the first comprehensive causal study of this phenomenon by analyzing and directly intervening in the internal representation geometry of large language models (LLMs). Our key contribution is an online variational probing framework that continuously estimates the subspace of the standard variety during fine-tuning, enabling projection-based decoupling from this space. While our study uses Arabic as a case due to its unusually rich parallel resources across 25 dialects, the broader motivation is methodological: dialectal MT serves as a controlled proxy for generative tasks where comparable multi-variety corpora are unavailable. Across 25 dialects, our intervention improves generation quality by up to +4.9 chrF++ and +2.0 on average compared to standard fine-tuning, despite a measured tradeoff in standard-language performance. These results provide causal evidence that subspace dominance by high-resource varieties can restrict generative capacity for related varieties. More generally, we unify geometric and information-theoretic probing with subspace-level causal interventions, offering practical tools for improving generative modeling in closely related language families and, more broadly, for controlling representational allocation in multilingual and multi-domain LLMs.